AITopics | consecutive frame

Collaborating Authors

consecutive frame

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quadratic Video Interpolation

Xiangyu Xu, Li Siyao, Wenxiu Sun, Qian Yin, Ming-Hsuan Yang

Neural Information Processing SystemsMar-13-2026, 12:36:51 GMT

Neural Information Processing Systems http://nips.cc/

dataset, interpolation, video, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey (0.04)
North America > United States > California > Merced County > Merced (0.04)
North America > Canada (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

fed6d142d12b2f8c031615cc8fd50893-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 20:14:25 GMT

adverse weather condition, information, semantic segmentation, (13 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.96)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Temporal Coherency based Criteria for Predicting Video Frames using Deep Multi-stage Generative Adversarial Networks

Prateep Bhattacharjee, Sukhendu Das

Neural Information Processing SystemsNov-21-2025, 15:57:49 GMT

The proposed method uses two stages of GANs to generate crisp and clear set of future frames.

artificial intelligence, machine learning, objective function, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

End-to-End Video Semantic Segmentation in Adverse Weather using Fusion Blocks and Temporal-Spatial Teacher-Student Learning Xin Y ang 1 Y an Wending 2 Michael Bi Mi2 Yuan

Neural Information Processing SystemsOct-10-2025, 22:38:36 GMT

The key idea of our fusion block is to offer the model a way to merge information from consecutive frames by matching and merging relevant pixels from those frames.

adverse weather condition, information, semantic segmentation, (13 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Biological Learning of Irreducible Representations of Commuting Transformations

Neural Information Processing SystemsAug-16-2025, 13:37:37 GMT

A longstanding challenge in neuroscience is to understand neural mechanisms underlying the brain's remarkable ability to learn and detect transformations of objects

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Supplementary Materials for " T emporal-attentive Covariance Pooling Networks for Video Recognition " Zilin Gao

Neural Information Processing SystemsAug-15-2025, 03:29:14 GMT

They obtain 0.5% and 0.7% gains over those with

dataset, emporal-attentive covariance pooling network, supplementary material, (10 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.05)
Asia > China > Liaoning Province > Dalian (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos

Quan, Yitong, Kiefer, Benjamin, Messmer, Martin, Zell, Andreas

arXiv.org Artificial IntelligenceJun-26-2025

Modern image-based object detection models, such as YOLOv7, primarily process individual frames independently, thus ignoring valuable temporal context naturally present in videos. Meanwhile, existing video-based detection methods often introduce complex temporal modules, significantly increasing model size and computational complexity. In practical applications such as surveillance and autonomous driving, transient challenges including motion blur, occlusions, and abrupt appearance changes can severely degrade single-frame detection performance. To address these issues, we propose a straightforward yet highly effective strategy: stacking multiple consecutive frames as input to a YOLO-based detector while supervising only the output corresponding to a single target frame. This approach leverages temporal information with minimal modifications to existing architectures, preserving simplicity, computational efficiency, and real-time inference capability. Extensive experiments on the challenging MOT20Det and our BOAT360 datasets demonstrate that our method improves detection robustness, especially for lightweight models, effectively narrowing the gap between compact and heavy detection networks. Additionally, we contribute the BOAT360 benchmark dataset, comprising annotated fisheye video sequences captured from a boat, to support future research in multi-frame video object detection in challenging real-world scenarios.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.2055

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.49)
Transportation > Ground > Road (0.34)
Automobiles & Trucks (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

MARVIS: Motion & Geometry Aware Real and Virtual Image Segmentation

Wu, Jiayi, Lin, Xiaomin, Negahdaripour, Shahriar, Fermüller, Cornelia, Aloimonos, Yiannis

arXiv.org Artificial IntelligenceApr-16-2025

Tasks such as autonomous navigation, 3D reconstruction, and object recognition near the water surfaces are crucial in marine robotics applications. However, challenges arise due to dynamic disturbances, e.g., light reflections and refraction from the random air-water interface, irregular liquid flow, and similar factors, which can lead to potential failures in perception and navigation systems. Traditional computer vision algorithms struggle to differentiate between real and virtual image regions, significantly complicating tasks. A virtual image region is an apparent representation formed by the redirection of light rays, typically through reflection or refraction, creating the illusion of an object's presence without its actual physical location. This work proposes a novel approach for segmentation on real and virtual image regions, exploiting synthetic images combined with domain-invariant information, a Motion Entropy Kernel, and Epipolar Geometric Consistency. Our segmentation network does not need to be re-trained if the domain changes. We show this by deploying the same segmentation network in two different domains: simulation and the real world. By creating realistic synthetic images that mimic the complexities of the water surface, we provide fine-grained training data for our network (MARVIS) to discern between real and virtual images effectively. By motion & geometry-aware design choices and through comprehensive experimental analysis, we achieve state-of-the-art real-virtual image segmentation performance in unseen real world domain, achieving an IoU over 78% and a F1-Score over 86% while ensuring a small computational footprint. MARVIS offers over 43 FPS (8 FPS) inference rates on a single GPU (CPU core). Our code and dataset are available here https://github.com/jiayi-wu-umd/MARVIS.

machine learning, natural language, segmentation, (20 more...)

arXiv.org Artificial Intelligence

2403.0985

Country: North America > United States > Maryland (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Food & Agriculture (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding

Kabir, Imran, Reza, Md Alimoor, Billah, Syed

arXiv.org Artificial IntelligenceMar-16-2025

Large multimodal models (LMMs) are increasingly integrated into autonomous driving systems for user interaction. However, their limitations in fine-grained spatial reasoning pose challenges for system interpretability and user trust. We introduce Logic-RAG, a novel Retrieval-Augmented Generation (RAG) framework that improves LMMs' spatial understanding in driving scenarios. Logic-RAG constructs a dynamic knowledge base (KB) about object-object relationships in first-order logic (FOL) using a perception module, a query-to-logic embedder, and a logical inference engine. We evaluated Logic-RAG on visual-spatial queries using both synthetic and real-world driving videos. When using popular LMMs (GPT-4V, Claude 3.5) as proxies for an autonomous driving system, these models achieved only 55% accuracy on synthetic driving scenes and under 75% on real-world driving scenes. Augmenting them with Logic-RAG increased their accuracies to over 80% and 90%, respectively. An ablation study showed that even without logical inference, the fact-based context constructed by Logic-RAG alone improved accuracy by 15%. Logic-RAG is extensible: it allows seamless replacement of individual components with improved versions and enables domain experts to compose new knowledge in both FOL and natural language. In sum, Logic-RAG addresses critical spatial reasoning deficiencies in LMMs for autonomous driving applications. Code and data are available at https://github.com/Imran2205/LogicRAG.

large language model, logic & formal reasoning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.12663

Country: